Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning.

نویسندگان

  • Michael A McDannald
  • Federica Lucantonio
  • Kathryn A Burke
  • Yael Niv
  • Geoffrey Schoenbaum
چکیده

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What is the role of orbitofrontal cortex in dopamine dependent reinforcement learning ? !

Orbitofrontal cortex (OFC) has been implicated in signalling reward expectancies, but its exact role , and how this differs from that the role of ventral striatum (VS), is an open question. One idea is that VS is the seat of value learning in model-free, dopamine-dependent reinforcement learning, while OFC represents values in dopamine-independent model-based learning. However, recent results [...

متن کامل

Fronto-striatal organization: Defining functional and microstructural substrates of behavioural flexibility

Discrete yet overlapping frontal-striatal circuits mediate broadly dissociable cognitive and behavioural processes. Using a recently developed multi-echo resting-state functional MRI (magnetic resonance imaging) sequence with greatly enhanced signal compared to noise ratios, we map frontal cortical functional projections to the striatum and striatal projections through the direct and indirect b...

متن کامل

The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice.

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. ...

متن کامل

Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis.

Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments-prediction error-is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic mod...

متن کامل

States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning

Reinforcement learning (RL) uses sequential experience with situations ("states") and outcomes to assess actions. Whereas model-free RL uses this experience directly, in the form of a reward prediction error (RPE), model-based RL uses it indirectly, building a model of the state transition and outcome structure of the environment, and evaluating actions by searching this model. A state predicti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The Journal of neuroscience : the official journal of the Society for Neuroscience

دوره 31 7  شماره 

صفحات  -

تاریخ انتشار 2011